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In this article, we study a partially linear single-index model for 
longitudinal data under a general framework which includes both the 
sparse and dense longitudinal data cases. A semiparametric estima¬ 
tion method based on a combination of the local linear smoothing 
and generalized estimation equations (GEE) is introduced to esti¬ 
mate the two parameter vectors as well as the unknown link func¬ 
tion. Under some mild conditions, we derive the asymptotic prop¬ 
erties of the proposed parametric and nonparametric estimators in 
different scenarios, from which we find that the convergence rates 
and asymptotic variances of the proposed estimators for sparse lon¬ 
gitudinal data would be substantially different from those for dense 
longitudinal data. We also discuss the estimation of the covariance (or 
weight) matrices involved in the semiparametric GEE method. Fur¬ 
thermore, we provide some numerical studies including Monte Carlo 
simulation and an empirical application to illustrate our methodology 
and theory. 

1. Introduction. Consider a semiparametric partially linear single-index 
model defined by 

(1.1) Y(t) = Z T (t)(3 + r](X T (t)6) + e(t), teT , 

where T is a bounded time interval, /3 and 6 are two unknown vectors of 
parameters with dimensions d and p, respectively, ??(•) is an unknown link 
function, Y(t ) is a scalar stochastic process, Z(t) and X(t) are covariates 


Received May 2014; revised February 2015. 

Supported in part by NSF Grants DMS-14-40121 and DMS-14-18042 and by Award 
Number 11228103, made by National Natural Science Foundation of China. 

2 Supported in part by Award Number KUS-CI-016-04, made by King Abdullah Uni¬ 
versity of Science and Technology (KAUST). 

AMS 2000 subject classifications. 62G09, 62H99, 62G99. 

Key words and phrases. Efficiency, GEE, local linear smoothing, longitudinal data, 
semiparametric estimation, single-index models. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics , 

2015, Vol. 43, No. 4, 1682-1715. This reprint differs from the original in 
pagination and typographic detail. 


1 




2 


CHEN, LI, LIANG AND WANG 


with dimensions d and p, respectively, and e(t) is the random error process. 
For the case of independent and identically distributed (i.i.d.) or weakly de¬ 
pendent time series data, there has been extensive literature on statistical 
inference of model (1.1) since its introduction by Carroll et al. (1997). Several 
different approaches have been proposed to estimate the unknown parame¬ 
ters and link function involved; see, for example, Xia, Tong and Li (1999), 
Yu and Ruppert (2002), Xia and Hardle (2006), Wang et al. (2010) and Ma 
and Zhu (2013). A recent paper by Liang et al. (2010) further developed 
semiparametric techniques for the variable selection and model specification 
testing issues in the context of model (1.1). 

In this paper, we are interested in studying partially linear single-index 
model (1.1) in the context of longitudinal data which arise frequently in 
many fields of research, such as biology, climatology, economics and epi¬ 
demiology, and thus have attracted considerable attention in the literature 
in recent years. Various parametric models and methods have been studied 
in depth for longitudinal data; see Diggle et al. (2002) and the references 
therein. However, the parametric models may be misspecified in practice, 
and the misspecihcation may lead to inconsistent estimates and incorrect 
conclusions being drawn. Hence, to circumvent this issue, in recent years, 
there has been a large literature on how to relax the parametric assumptions 
on longitudinal data models and many nonparametric, and semiparametric 
models have thus been investigated; see, for example, Lin and Ying (2001), 
He, Zhu and Fung (2002), Fan and Li (2004), Wang, Carroll and Lin (2005), 
Lin and Carroll (2006), Wu and Zhang (2006), Li and Hsing (2010), Jiang 
and Wang (2011) and Yao and Li (2013). 

Suppose that we have a random sample with n subjects from model (1.1). 
For the zth subject, i = 1,... ,n, the response variable Yi(t) and the covari¬ 
ates {Zj(t),Xj(£)} are collected at random time points t tJ , j = 1 
which are distributed in a bounded time interval T according to the prob¬ 
ability density function frit)- Here m; is the total number of observations 
for the zth subject. To accommodate such longitudinal data, model (1.1) is 
written in the following framework: 

( 1 . 2 ) Yi(tij) = Z J ( tij)(3 + rj(X.J (Uj)0) + ) 

for z = 1,..., n and j = 1,..., m*. When rrii varies across the subjects, the 
longitudinal data set under investigation is unbalanced. Several nonpara¬ 
metric and semiparametric models can be viewed as special cases of model 

(1.2) . For instance, when /3 = 0 , model (1.2) reduces to the single-index lon¬ 
gitudinal data model [Jiang and Wang (2011), Chen, Gao and Li (2013a)]; 
when p = 1 and 0 = 1, model (1.2) reduces to the partially linear longitudi¬ 
nal data model [Fan and Li (2004)]. To avoid confusion, we let (3 0 and 6 q be 
the true values of the two parameter vectors. For identifiability reasons, 8 q 
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is assumed to be a unit vector with the first nonzero element being positive. 
Furthermore, we allow that there exists certain within-subject correlation 
structure for ei(tij), which makes the model assumption more realistic but 
the development of estimation methodology more challenging. 

To estimate the parameters j3 Q , 9 o as well as the link function r/(-) in 
model (1.2), we first apply the local linear approximation to the unknown 
link function, and then introduce a profile weighted least squares approach 
to estimate the two parameter vectors based on the technique of general¬ 
ized estimation equations (GEE). Under some mild conditions, we derive 
the asymptotic properties of the developed parametric and nonparametric 
estimators in different scenarios. Our framework is flexible in that m; can 
either be bounded or tend to infinity. Thus both the dense and sparse lon¬ 
gitudinal data cases can be included. Dense longitudinal data means that 
there exists a sequence of positive numbers M n such that min* rm > M n , and 
M n —> oo as n —> oo [see, e.g., Hall, Muller and Wang (2006) and Zhang and 
Chen (2007)], whereas sparse longitudinal data means that there exists a 
positive constant M* such that max* m; < M*; see, for example, Yao, Muller 
and Wang (2005), Wang, Qian and Carroll (2010). We show that the conver¬ 
gence rates and asymptotic variances of our semiparametric estimators in 
the sparse case are substantially different from those in the dense case. Fur¬ 
thermore, we show that the proposed semiparametric GEE (SGEE)-based 
estimators are asymptotically more efficient than the profile unweighted least 
squares (PULS) estimators, when the weights in the SGEE method are cho¬ 
sen as the inverse of the covariance matrix of the errors. We also introduce 
a semiparametric approach to estimate the covariance matrices (or weights) 
involved in the SGEE method, which is based on a variance-correlation de¬ 
composition and consists of two steps: first, estimate the conditional variance 
function using a robust nonparametric method that accommodates heavy¬ 
tailed errors, and second, estimate the parameters in the correlation matrix. 
A simulation study and a real data analysis are provided to illustrate our 
methodology and theory. 

The rest of the paper is organized as follows. In Section 2, we introduce 
the SGEE methodology for estimating /3 0 , 6q and ry(-). Section 3 establishes 
the large sample theory for the proposed parametric and nonparametric 
estimators and gives some related discussions. Section 4 discusses how to 
determine the weight matrices in the estimation equations. Section 5 gives 
some numerical examples to investigate the finite sample performance of 
the proposed approach. Section 6 concludes the paper. Technical assump¬ 
tions are given in Appendix A. The proofs of the main results are given in 
Appendix B. Some auxiliary lemmas and their proofs are provided in the 
supplementary material [Chen et al. (2015)]. 
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2. Estimation methodology. Various semiparametric estimation 
approaches have been proposed to estimate model (1.1) in the case of i.i.d. ob¬ 
servations (or weakly dependent time series data). See, for example, Carroll 
et al. (1997) and Liang et al. (2010) for the profile likelihood method, Yu 
and Ruppert (2002) and Wang et al. (2010) for the “remove-one-component” 
technique using penalized spline and local linear smoothing, respectively, 
and Xia and Hardle (2006) for the minimum average variance estimation 
approach. However, there is limited literature on partially linear single-index 
models for longitudinal data because of the more complicated structures in¬ 
volved. Recently, Chen, Gao and Li (2013b) studied a partially linear single¬ 
index longitudinal data model with individual effects. To remove the indi¬ 
vidual effects and derive consistent semiparametric estimators, they had to 
limit their discussions to the dense and balanced longitudinal data case. Ma, 
Liang and Tsai (2014) considered a partially linear single-index longitudinal 
data model by using polynomial splines to approximate the unknown link 
function, but their discussion was limited to the sparse and balanced lon¬ 
gitudinal data case. In contrast, as mentioned in Section 1, our framework 
includes both the sparse and dense longitudinal data cases. Meanwhile, ob¬ 
servations are allowed to be collected at irregular and subject specific time 
points. All this provides much wider applicability of our framework. Further¬ 
more, to improve the efficiency of the semiparametric estimation, we develop 
a new profile weighted least squares approach to estimate the parameters 
An 0o as well as the link function 
To simplify the presentation, let 

Y i = (Yj(fji),...,Yj(t imi )) T , Xj = (Xj(fji),... ,X.j(t imi )) T , 

— (Zifal), • • • 5 • • • 5 •> 

r,(X u 9) = (t/(XT( ta)0), • • •, V(*J(timi)0)) T ■ 

With the above notation, model (1.2) can then be re-written as 

(2.1) Yj = Zj/3g + r?(X.j, 6q) + e*. 

We further let Y = (Y^,..., Y^) T , Z = (Zj, ...,Zj) r ,E = (ej , 

T 7 (X,0) = (? 7 T (Xi, 6 ),..., ? 7 T (X n , 9)) T . Then model (2.1) is equivalent to 

(2.2) Y = Z/3q + ?l(X, 6q) + E. 

Our estimation procedure is based on the profile likelihood method, which 
is commonly used in semiparametric estimation; see, for example, Carroll 
et al. (1997), Fan and Huang (2005) and Fan, Huang and Li (2007). Let 
Yij = Yi(tij), Zij = Zi(Uj) and Xjj = Xj(tjj). For given (3 and 9 , we can 
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estimate rj(-) and its derivative rj(-) at point u by minimizing the following 
loss function: 


L n (a,b\f3,6) 

(2.3) 

n ( mi /vT/i_ ii\ 'l 

=x{vD r « - ^ - “ - b(x ’i e - (■-v^) }• 

where K(-) is a kernel function, h is a bandwidth and Wi, i = 1 ,n, are 
some weights. It is well known that the local linear smoothing has advan¬ 
tages over the Nadaraya-Watson kernel method, such as higher asymptotic 
efficiency, design adaption and automatic boundary correction [Fan and Gij- 
bels (1996)]. Following the existing literature such as Wu and Zhang (2006), 
the weights Wi can be specified by two schemes: Wi = 1 /T n (type 1) and 
Wi = 1 /(nrrii) (type 2), where T n = Yl?=i m i■ The type 1 weight scheme cor¬ 
responds to an equal weight for each observation, while the type 2 scheme 
corresponds to an equal weight within each subject. As discussed in Huang, 
Wu and Zhou (2002) and Wu and Zhang (2006), the type 2 scheme may 
be appropriate if the number of observations varies across subjects. As the 
longitudinal data under investigation in this paper are allowed to be unbal¬ 
anced, we use Wi = 1/(nrrii), which was also used by Li and Hsing (2010) 
and Kim and Zhao (2013). We denote 

(2.4) (rf(u\(3,6),rj(u\f3, 0)) T = argminL n (a, 6|/3, G). 

a,b 

By some elementary calculations [see, e.g., Fan and Gijbels (1996)], we have 

n 

(2.5) fj(u\(3,0) = Y,Si(u\6)(Y i -Z i (3) 

i =1 


for given (3 and 0, where 


Sj(u|0) = (1,0) 


-i -l 


^X i r (u|0)K i (u|0)X i (u|0) 


. 1=1 


-T 


Xj (u\0)Ki(u\d), 


( 2 . 6 ) 


X,;(u|0) = (Xj 1 (lt|0), . . . , Xj mi (u|0)) , 


X ij (u|fl) = (l,X i ^-u) T , 


Ki(u\d) = diag (^WiK ^, • • •, w t K 


h 


Based on the profile least squares approach with the first-stage local lin¬ 
ear smoothing, we can construct estimators of the parameters /3 0 and 6q. 
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We start with the PULS method which ignores the possible within-subject 
correlation structure. Define the PULS loss function by 

n 

Q n0 ((3,G) = - Zj/3 - rj(Xi\0, 6 »)] T [Y, - Z t 0 - rj(Xi\0,G)} 

2—1 

(2-7) 

= [Y - Z0 - fj(X\0,G)] J [Y -1.0- fj(X\0, 0)], 

where, for given 0 and G , rj(X.i\0,G) and rj(X\0,G) are the local linear 
estimators of the vectors rj(Xi,G) and r](X,G), respectively; that is, each 
element of rj(Xi\0,0) and rj(X\0,9) is defined as in (2.5). The PULS esti¬ 
mators of 0q and 6q are obtained by minimizing the loss function Q n q(0, 6) 
with respect to 0 and 6 and normalizing the minimizer 6. We denote the 
resulting estimators by 0 and 6, respectively. 

Although it is easy to verify that both 0 and G are consistent, they are 
not efficient as the within-subject correlation structure is not taken into 
account. Hence, to improve the efficiency of the parametric estimators, we 
next introduce a GEE-based method to estimate the parameters 0 O and 
6 o- Existing literature on GEE-based method in longitudinal data analysis 
includes Liang and Zeger (1986), Xie and Yang (2003) and Wang (2011). Let 
W = diagjWi,... ,W„}, where W* = Rl 1 and Rj is an m; x m; working 
covariance matrix whose estimation will be discussed in Section 4. Define 

P Z (X,, 6) = (p z (xle\G ),... , Pz (xJ mi e\G)) T , Pz (u\o) = e[z^| xje = u], 

Px(Xi, o) = (px(x J T 1 0|0),... ,px(xJ mt e\e)) T , P ^{u\e) = e[x^-|x70 = „ 
A t (G) = (Zj - p z (Xj, G), [f/(Xj, 0) 0 lj] © [Xj - Px(X ? :,0)]), 

where f/(X*,0) is a column vector with its elements being the derivatives of 
r/(-) at points XL 0 , j = 1 l p is a p-dinrensional vector of ones, 0 

is the Kronecker product and 0 denotes the componentwise product. The 
construction of the parametric estimators is based on solving the following 
equation with respect to 0 and G: 

n 

(2.8) ^ \J(G)W i [Y i - Z t 0 - rj(Xi\0, 0)] = 0, 

2—1 

where Aj(0) is an estimator of Aj(0) with p z (Xj, G), p x (Xj, G) and 17 (Xj, 6) 
replaced by their corresponding local linear estimated values. Let 0 and G\ 
be the solutions to the estimation equations in (2.8), and let the SGEE-based 
estimator of G 0 be defined as G = #i/||0i||, where || • || is the Euclidean 
norm. Note that the solutions to the equations in (2.8) generally do not 
have a closed form. In the numerical studies, we use the trust-region dogleg 
algorithm within the Matlab command “fsolve” to obtain the solutions to 
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(2.8) . Corollary 1 below shows that the SGEE-based estimators (3 and 9 are 
generally asymptotically more efficient than the PULS estimators (3 and 6 , 
when the weights are chosen appropriately. 

Replacing [3 and 6 in rj(-) by (3 and 0, respectively, we obtain the local 
linear estimator of the link function rj(-) at u as 

n 

(2.9) rf(u) = rf(u\P,6) = ^ Sj(u|0)(Y, - Z;3)- 

i= 1 

In Section 3 below, we will give the large sample properties of the esti¬ 
mators proposed above, and in Section 4, we will discuss how to choose the 
working covariance matrix Rj. 


3. Theoretical properties. Before establishing the large sample theory 
for the proposed parametric and nonparametric estimators, we introduce 
some notation. Let Bo be a p x (j> — 1) matrix such that M = (0o 5 Bo) is a 
pxp orthogonal matrix, and define 


I(B 0 ) 


( Id Odx(p— 1) 
\Opxd Bq 


where Ij, is a k x k identity matrix and Okxi is a k x l null matrix. Let 
Aj = Aj(0o)j and assume that there exist two positive semi-definite matrices 
fl 0 and fii as well as a sequence of numbers u: n such that ui n oo, 


(3.1) 


1 _ 

— VATWiAiAno, 
f-f 


(3.2) 

(3.3) 


1 

— V E[A7 W^eJ WjAj] -»■ fl u 
Un 7^i 

max E[A7Wjeje^WjAj] = o(io n ), 

1 <i<n 


as n —> oo, and I T (Bo)rioI(Bo) is positive definite. Conditions (3.2) and 
(3.3) ensure that the Lindeberg-Feller condition can be satisfied, and thus 
the classical central limit theorem for independent sequence [Petrov (1995)] 
is applicable. It is not difficult to verify the assumption in (3.3) for the dense 
and sparse longitudinal data. In particular, (3.3) excludes the case where 
the term A^W^ej from one or a few subjects dominates those from the 
others. For the latter case, it may be possible to derive the consistency of the 
proposed parametric estimation, but the proof of the asymptotic normality 
would be difficult. Let 12^ be the Moore-Penrose inverse matrix of fio, 
which is defined as = I(Bo)[I T (Bo)S2oI(Bo)]~ 1 I T (Bo). We next give the 
asymptotic distribution theory for the SGEE-based estimators (3 and 6. 
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Theorem 1. Suppose that Assumptions 1-5 in Appendix A and (3.1)- 

(3.3) are satisfied. Then we have 

(3.4) w l J 2 f f 0 ) A n(o, 

\v - t) o / 

as no oo. 


Remark 1. Theorem 1 establishes the asymptotically normal distribu- 

^ ^ "" 1/2 

tion theory for (3 and 0 with convergence rate uj n ■ This ui n is linked to h 
through n in a certain way. Specifically, the condition uj n h 6 —0 in Assump¬ 
tion 5 needs to be satisfied to ensure that the bias term of the parametric 
estimation is asymptotically negligible. The specific forms of oj n , Oo and 
Oi can be derived for some particular cases, for instance, when longitudinal 
data are balanced, that is, = m,u n = nm. Furthermore, assume that the 
covariates and the error are i.i.d. with E [e 2 (tij)\ =a 2 , ei(tij ) is independent 
of the covariates and Wj, i = 1,... ,n, are m x m identity matrices. Then 
we can show that 


/ O 0 (l) ^o(2)\ 
W(2) f¥3 )J 


and 


Oi = ^ 


/O 0 (l) 

W(2) 


O 0 (2) 

O 0 (3) 


where 


Oo(i) = E{[z(t) - Pz (x T (t)o 0 \e 0 )}[z(t) - Pz (x T (t)o 0 \e 0 )} T }, 

O 0 (2) = E{7)(X T (t)0 o )[Z(t) - pz(X T (t)0 o |0o)][X(t) - p x (X T (f)0 o |0o)] T }, 
O 0 (3) = E{[77(X T (t)0 o )] 2 [X(t) - p x (X T (t)0 o |0o)][X(f) - p x (X T (t)6 0 \0 0 )} T } 
Hence reduces to a 2 Oq . 

In Theorem 1 above, we only require n —> oo. As mentioned in Section 1, 
both the sparse and dense longitudinal data cases can be included in a unified 
framework. For the sparse longitudinal data case when m* is bounded by a 
certain positive constant, we can take uj n = n and prove that (3.4) holds. For 
the dense longitudinal data case where minimi > M n with M n oo, under 
some regularity conditions we may prove (3.4) with w n = rn i■ As more 
observations are available in the dense longitudinal data case and the order 
for the total number of the observations is higher than n, the convergence 
rate for the parametric estimators is faster than the well-known root-n rate 
in the sparse longitudinal data case. 


Using Theorem 1, we can obtain the following corollary. 

Corollary 1. Suppose that the weights Wj in (2.8) are chosen as 
the inverse of the conditional covariance matrix of ej ; and the conditions 
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of Theorem 1 are satisfied. Then the SGEE-based estimators (3 and 6 are 
asymptotically more efficient than the PULS estimators (3 and 6 defined in 
Section 2. 

Remark 2. In the proof of the above corollary, we show that the asymp¬ 
totic covariance matrix of the PULS estimators (3 and 9 (after appropriate 
normalization) minus that of the SGEE-based estimators (3 and 0 is positive 
semi-definite, although the two estimation methods have the same conver¬ 
gence rates. That is, under the conditions assumed in Theorem 1, the limit 
matrix of w n [Var(/3, 6) — Var(/3,0)] is positive semi-definite. For the case of 
independent observations, a recent paper by Luo, Li and Yin (2014) dis¬ 
cussed the efficient bound for the semiparametric estimation in single-index 
models. Following their idea, we conjecture that modification of our esti¬ 
mation procedure may be needed to obtain the efficient estimation in the 
partially linear single-index longitudinal data models. We will study this 
issue in our future research. 

To establish the asymptotic distribution theory for the nonparametric 
estimator rj(u) under a unified framework, we assume that there exist a 
sequence <£> n (/i) and a constant 0 < cr* < oo such that 

(3.5) ip n (h) = o(u} n ), ip n (h) max E[si(u\9 Q )eiej8j(u\0 o )]=o(l) 

l<i<n 

and 

n 

(3.6) <p n (h) ^ E[sj(u|0 o )eie7 sj (u|0 o )] 

1=1 

The first restriction in (3.5) is imposed to ensure that the parametric con¬ 
vergence rates are faster than the nonparametric convergence rates, and 
the second restriction in (3.5) and the condition in (3.6) are imposed for 
the derivation of the asymptotic variance of the local linear estimator rj(u) 
and the satisfaction of the Lindeberg-Feller condition. The specific forms of 
<p n {h) and will be discussed in Remark 3 below. Let fij = f v' J K(v) dv for 
j = 0,1,2 and ijo(-) be the second-order derivative of 770 (-)- 

Theorem 2. Suppose that the conditions of Theorem 1, (3.5) and (3.6) 
are satisfied. Then we have 

(3-7) cpl/ 2 (h)[rf{u) -r] 0 (u) -b v (u)h 2 ] -^N(0,cr^), 

where b n (u) =r)o(u)^ 2 / 2 . 
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Remark 3. Theorem 2 provides the asymptotically normal distribu¬ 
tion theory for the nonparametric estimator rj(u) with a convergence rate 
Op{ipr^^ 2 (h) + h 2 ). The forms of <p n {h) and a 2 in Theorem 2 depend on the 
type of the longitudinal data under study, that is, whether it is sparse or 
dense. We can derive their specific forms for some particular cases. Consider, 
for example, the case where ej(ty) = Vi + £ij, in which £ij are i.i.d. across 
both i and j with E[e^] = 0 and E[e|-] = a 2 , and {vi} is an i.i.d. sequence of 
random variables with E[uj] = 0 and E[u 2 ] = a 2 and is independent of {£ij}- 
In this case, we note that 


E< 


Y. K 

Lj=l 

-e/ 


Xj0 o - u 


-IJ 


mi 


E*- 

Lj=l 


X, 1 )#o - u 


(yi + £ij ) 


=X E 

3 =1 


K 


Xlgo -u 


( Vi + £ijY 


+ X E 

3 I 7 L /2 


K 


X^flo ~u 


K 


X7Jo-m 


*32 


(vi + £ij 1 ) (Vi + £ij 2 ) 


~ mihv 0 fe 0 (u)(a 2 + a 2 ) + nm{rni - l)h 2 n z 0 f& Q (u)cr: 


where uq = J K 2 {y) dv and fe 0 {-) is the probability density function of X^0 O . 

For the sparse longitudinal data case, mi(mi — l)h 2 n^fg Q {u)(j 2 is dom¬ 
inated by mihuofo 0 (u)(a 2 + cr 2 ), as rrii is bounded and h —> 0. Then, by 
Lemma 1 in the supplementary document [Chen et al. (2015)] and some 
elementary calculations, we can prove that 


(3.8) 


^E[sj(«|0o)eje7s7(«|6>o)] ~ 
2—1 


1 mihv 0 (a 2 + a 2 ) 

{nh) 2 f^ m 2 fo 0 (u) 


+ a e) J_ 

n 2 hfg 0 (u ) “ rrii' 


Hence, in this case, we can take ip n (h ) = (n 2 /i)(^) , =1 l/m*) -1 which has the 
same order as nh, and cr 2 = + a 2 )/ fg 0 (u). This result is similar to 

Theorem l(i) in Kim and Zhao (2013). 

For the dense longitudinal data case, m,ihi / ofg 0 (u)(a 2 + a 2 ) is dominated 
by mi(rrii — l)h 2 HQfg o (u)a 2 if we assume that rriih -A oo. Then, again by 
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Lemma 1 in the supplementary material [Chen et al. (2015)], we can prove 
that 


^2 E[si(n|6> 0 )eie7H^o)] 


i— 1 


(n/i) 2 z -"' 
' 1=1 


mi(rrii — l)h 2 


mt 


n 


Hence, in this case, we can take tp n (h) = n and a 2 = /^§cr^, which are analo¬ 
gous to those in Theorem 1 (ii) of Kim and Zhao (2013) and quite different 
from those in the sparse longitudinal data case. 


4. Estimation of covariance matrices. Estimation of the weight or work¬ 
ing covariance matrices, which are involved in the SGEE (2.8), is critical to 
improving the efficiency of the proposed semiparametric estimators. How¬ 
ever, the unbalanced longitudinal data structure, which can be either sparse 
or dense, makes such covariance matrix estimation very challenging, and 
some existing estimation methods based on balanced data [such as Wang 
(2011)] cannot be directly used here. In this section, we introduce a semipara¬ 
metric estimation approach that is applicable to both sparse and dense un¬ 
balanced longitudinal data. This approach is based on a variance-correlation 
decomposition, and the estimation of the working covariance matrices then 
consists of two steps: first, estimate the conditional variance function us¬ 
ing a robust nonparametric method that accommodates heavy-tailed errors, 
and second, estimate the parameters in the correlation matrix. For recent 
developments on the study of the covariance structure in longitudinal data 
analysis, we refer to Fan and Wu (2008), Zhang, Leng and Tang (2015) and 
the references therein. 

For each 1 < i < n, let R,; be the covariance matrix of e* and 

Ej = diag{cr 2 (Li), • ■ •, cr 2 {t imi )} 

with a 2 (tij) = E[e 2 (tij)\tij\ = E[ef(iy)l*ij ,Xj(iy), Z^-)] for j = l,...,rm, 
and C i be the correlation matrix of e*. Assume that there exists a q- 
dimensional parameter vector 0 such that C,; = Cj(0) where C*(•), 1 < 
i <n, are pre-specified. By the variance-correlation decomposition, we have 

(4.1) R i = sJ /2 C i (0)E t V 2 . 

The above semiparametric covariance structure has been studied in some of 
the existing literature [see, e.g., Fan, Huang and Li (2007) and Fan and Wu 
(2008)] and provides a flexible framework to capture the error covariance 
structure, especially when the dimension of 0 is large. For example, it is 
satisfied when ei(tij ) has the AR(1) or ARMA(1,1) dependence structure 
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for each i ; see, for example, the simulated example in Section 5.1. When 
ei(tij) = cr(tij)(vi + £ij ) in which and satisfy the conditions discussed 
in Remark 3 and cr 2 + c 2 = 1, we can also show that the semiparametric 
covariance structure is satisfied with 0 being cr 2 or cr 2 . Some existing pa¬ 
pers such as Wu and Pourahmadi (2003) suggest the use of a nonparametric 
smoothing method to estimate the covariance matrix. However, they usually 
need to assume that the longitudinal data are balanced or nearly balanced, 
which would be violated when the data are collected at irregular and possi¬ 
bly subject-specific time points. Yao, Muller and Wang (2005) proposed the 
approach of functional data analysis to estimate the covariance structure 
for sparse and irregularly-spaced longitudinal data. However, some substan¬ 
tial modification may be needed to extend the method of Yao, Muller and 
Wang (2005) to our framework, which includes both the sparse and dense 
longitudinal data. 

In the present paper, we first estimate the conditional variance function 
c 2 (-) in the diagonal matrix X,, by using a nonparametric method. In recent 
years, there has been a rich literature on the study of nonparametric con¬ 
ditional variance estimation; see, for example, Fan and Yao (1998), Yu and 
Jones (2004), Fan, Huang and Li (2007) and Leng and Tang (2011). However, 
when the errors are heavy-tailed, which is not uncommon in economic and 
financial data analysis, most of these existing methods may not perform 
well. This motivates us to devise an estimation method that is robust to 
heavy-tailed errors. Let r(tij) = [Yij — Zjjflo — rj(XjjO o)] 2 . We can then find 
a random variable £(Uj) so that r(tij) = cr 2 {tij)C 2 (tij) and E[£ 2 (tjj)|tjj] = 1 
with probability one. By applying the log-transformation [see Peng and Yao 
(2003) and Chen, Cheng and Peng (2009) for the application of this trans¬ 
formation in time series analysis] to r[tij ), we have 

(4.2) logr(t i j) = log[Ta 2 (ti j )\ + log[T~ 1 f(t ij )] = al(tij) + ^o{tij), 

where r is a positive constant such that E[£ 0 (ijj)] = E{log[r _1 £ 2 (tjj)]} = 0. 
Here, Co(Uj) could be viewed as an error term in model (4.2). As r VJ = r(t t j) 
are unobservable, we replace them with 

ri j = [Y ij -Z] j 0-ff(lQ' j 0\0,0)] 2 , 

where [3 and 6 are the PULS estimators of (3 0 and 9 o, respectively. In order 
to estimate <r 2 (t), we define 

n f m i /, _ . \ 

(4.3) L n (a, b ) = Yl | Xj[ lo g(% + Cn) - a - b{Uj - t )} 2 K x ^ j 

where K\(-) is a kernel function, h± is a bandwidth satisfying Assumption 9 
in Appendix A, Wi = 1 /(nrrii) as in Section 2 and ( n —> 0 as n —> oo. Through¬ 
out this paper, we set ( n = 1 /T n , where T n = \ m i- The ( n is added in 
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log(fjj + Cn) to avoid the occurrence of invalid logO as Q n > 0 for any n. 
Such a modification would not affect the asymptotic distribution of the con¬ 
ditional variance estimation under certain mild restrictions. Then cr 2 (f) can 
be estimated as 


(4.4) 


lx l(t) = a where (a, b) T = argminL n (a, b). 

a,b 


On the other hand, noting that . : T Y ; " 11 ^ 2 (^.) = Vi - and E[£ 2 (fjj)] = 1, 
the constant r can be estimated by 

1 n m,i 1 ~ 1 

(4.5) 9 = 


7jT eX P 


i= 1 j =1 


We then estimate <r 2 (t) by 
(4.6) a\t) = 


exp {u 2 (f)} 


It is easy to see that thus defined estimator u 2 (t) is always positive. 

Suppose that there exists a sequence tp n o{hi) which depends on hi, and 
a constant 0 < <7 2 < oo such that 


(4.7) 

and 

(4.8) 


<Pno{hl) = o(u n ), 


^no(^l) 2-p 

- 7j — max Wj E 

hf 1 <i<n 


TMu^ 

Lj=l 


tij t 


n 2 


= 0 ( 1 ) 


^Pno{hl ) 

frit) hi 


E 


' n rrii / + .._ + 

I 3 = 1 ' 


OA 


which are similar to those in (3.5) and (3.6), where /t(') is the density 
function of the observation times tij. Define 

Ki ( f )= ^ m di[t) js Kliv)dV: 

M«) = J 1 ’ 2 K\(v)dv, 

where <t 2 (-) is the second-order derivative of <x 2 (-). We then establish the 
asymptotic distribution of <r 2 (t) in the following theorem, whose proof is 
given in the supplementary material [Chen et al. (2015)]. 
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Theorem 3. Suppose the conditions in Theorems 1 and 2, Assump¬ 
tions 6-9 in Appendix A, (4-7) and (4-8) are satisfied. Then we have 

(4.9) Trio {hi){d 2 (t) - a 2 (t ) - [b al (t) - b a2 {t)\h{} —A N fo, • 

Remark 4. Theorem 3 can be seen as an extension of Theorem 1 in 
Chen, Cheng and Peng (2009) from the time series case to the longitudinal 
data case. The longitudinal data framework in this paper is more flexible 
and includes both sparse and dense data types. If £<>(%) = v? + £?•, where £?■ 
are i.i.d. across both i and j with E[s?] = 0 and E[(e?-) 2 ] < oo, and {v?} is 
an i.i.d. sequence of random variables with E[u?] = 0 and E[(u?) 2 ] < oo and 
is independent of {eL}, following the discussion in Remark 3, we can again 
show that the form of <p n o(hi) depends on the type of the longitudinal data, 
and thus the nonparametric conditional variance estimation has different 
convergence rates for sparse and dense data. 

We next discuss how to obtain the optimal value of the parameter vec¬ 
tor 0. Construct the residuals e* = Y.; — Z i(3 — r/(Xi,0), where r)(Xj,0) is 
defined in the same way as r](X.i,6 ) but with ?/(•) and 0 replaced by rj(-) = 
rj(-\P,9) and 6, respectively. Let Aj = A fid), = diag{ij 2 (Li), • • ■ ,3 :2 (Lm i )}, 

xv J/2 yv /2 

and dehne R*(0) = Sj Cj(0)X f . Motivated by equations (3.1) and (3.2), 
we construct 

n 

(4.10) O*(0) = ^A 4 T [R*(0)]- 1 A l 

1=1 

and 

n 

(4.11) = Y,aJ [R [R*(0)] _1 Ai. 

i=1 

By Theorem 1, the sandwich formula estimate [S"2o(0)] + fii(0)[r2o(0)] + is 
asymptotically proportional to the asymptotic covariance of the proposed 
SGEE estimators when the inverse of R*(0) is chosen as the weight matrix. 
The optimal value of 0, denoted by 0, can be chosen to minimize the de¬ 
terminant |[n5(0)]+nj(0)[ns(0)]+|. Such a method is called the minimum 
generalized variance method [Fan, Huang and Li (2007)]. With the chosen 
0, we can estimate the covariance matrices by 

(4.12) R00) = S l 1/2 C 4 (0)S l 1/2 , 

whose inverse will be used as the weight matrices in the SGEE method. 
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5. Numerical studies. In this section, we first study the finite sample 
performance of the proposed SGEE estimators through Monte Carlo sim¬ 
ulation, and then give an empirical application of the proposed model and 
methodology. 


5.1. Simulation studies. We investigate both sparse and dense longitu¬ 
dinal data cases with an average time dimension m of 10 for the sparse 
data and 30 for the dense data. We use two types of within-subject cor¬ 
relation structure, AR(1) and ARMA(1,1), in the error terms ei(tij). We 
investigate the finite sample performance of the proposed estimators under 
both correct specification and misspecification of the correlation structure in 
the construction of the covariance matrix estimator proposed in Section 4. 
For the misspecified case, we fit an AR(1) correlation structure while the 
true underlying structure is ARMA(1,1) and examine the robustness of the 
estimators. 

Simulated data are generated from model (1.2) with two-dimensional 
Z i(tij) and three-dimensional Xi(ty), and 

/3 0 = ( 2,1) t , 6q = ( 2 , 1 , 2) t /3 and rj(u) = 0.5exp(u). 

The covariates (Z J ( tij),X.J(Uj)) T are generated independently from a five¬ 
dimensional Gaussian distribution with mean 0, variance 1 and pairwise 
correlation 0.1. The observation times tjj are generated in the same way 
as in Fan, Huang and Li (2007): for each subject, {0,1,2,..., T} is a set of 
scheduled times, and each scheduled time from 1 to T has a 0.2 probability 
of being skipped; each actual observation time is a perturbation of a non- 
skipped scheduled time; that is, a uniform [ 0 , 1 ] random number is added to 
the nonskipped scheduled time. Here T is set to be 12 or 36, which corre¬ 
sponds to an average time dimension of m = 10 or m = 30, respectively. For 
each i, the error terms ej(Lj) are generated from a Gaussian process with 
mean 0 , variance function 

(5.1) var[e(t)] = er 2 (f) = 0.25exp(i/12) 


and serial correlation structure 
(5.2) cor(e(t),e(s)) 


1 , t = s, 

t^s. 


Note that (5.2) corresponds to an ARMA(1,1) correlation structure and 
reduces to an AR(1) correlation structure when 7 = 1. The number of sub¬ 
jects, to, is taken to be 30 or 50. The values for 7 and p are ( 7 , p) = (0.85,0.9) 
in the ARMA(1,1) correlation structure and ( 7 ,p) = (1,0.9) in the AR(1) 
structure. 

For each combination of m, n, and the correlation structure, the number 
of simulation replications is 200. For the selection of the bandwidth, however, 
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Table 1 

Performance of parameter estimation methods under correct specification of an 
underlying AR(1) correlation structure 



n 



30 



50 


m 

Parameters 

Methods 

Bias 

SD 

MAD 

Bias 

SD 

MAD 

10 

0i 

PULS 

SGEE 

0.0048 

-0.0026 

0.0402 

0.0508 

0.0288 

0.0081 

-0.0030 

-0.0016 

0.0308 

0.0259 

0.0195 

0.0074 


02 

PULS 

SGEE 

-0.0024 

-0.0018 

0.0409 

0.0298 

0.0243 

0.0110 

0.0049 

0.0033 

0.0267 

0.0310 

0.0180 

0.0077 


0i 

PULS 

SGEE 

-0.0049 

-0.0013 

0.0299 

0.0164 

0.0180 

0.0083 

-0.0009 

-0.0002 

0.0197 

0.0118 

0.0134 

0.0046 


02 

PULS 

SGEE 

0.0011 

0.0026 

0.0380 

0.0188 

0.0229 

0.0100 

-0.0016 

0.0006 

0.0237 

0.0108 

0.0161 

0.0067 


03 

PULS 

SGEE 

0.0018 

-0.0007 

0.0314 

0.0182 

0.0188 

0.0090 

0.0006 

-0.0004 

0.0203 

0.0088 

0.0147 

0.0052 

30 

di 

PULS 

SGEE 

0.0003 

-0.0081 

0.0408 

0.1134 

0.0277 

0.0106 

0.0016 

0.0007 

0.0328 

0.0108 

0.0222 

0.0083 


02 

PULS 

SGEE 

-0.0020 

-0.0017 

0.0425 

0.0420 

0.0317 

0.0096 

0.0005 

-0.0064 

0.0351 

0.0152 

0.0202 

0.0079 


01 

PULS 

SGEE 

0.0020 

-0.0008 

0.0315 

0.0247 

0.0213 

0.0075 

-0.0020 

0.0001 

0.0244 

0.0148 

0.0182 

0.0064 


02 

PULS 

SGEE 

-0.0035 

-0.0027 

0.0340 

0.0242 

0.0240 

0.0090 

-0.0083 

-0.0013 

0.0278 

0.0104 

0.0163 

0.0066 


03 

PULS 

SGEE 

-0.0027 

0.0009 

0.0321 

0.0230 

0.0185 

0.0074 

0.0045 

0.0001 

0.0267 

0.0162 

0.0169 

0.0068 


due to the running time limitation, we first run a leave-one-unit-out (i.e., 
leave out observations from one subject at a time) cross-validation (CV) to 
choose the optimal bandwidths from 20 replications. We then use the aver¬ 
age of the optimal bandwidths from these 20 replications as the bandwidth 
in the 200 replications of the simulation study. For the SGEE method, we 
choose the weight matrix as the inverse of the estimated within-subject co- 
variance matrix as constructed in (4.12) of Section 4. We first study the 
performance of the proposed estimators in the case where the correlation 
structure in the estimation of the covariance matrix is correctly specified, 
and then investigate the robustness of the estimators to the misspecihca- 
tion of the correlation structure. The bias, calculated as the average of the 
estimates from the 200 replications minus the true parameter values, the 
standard deviation (SD), calculated as the sample standard deviation of the 
200 estimates and the median absolute deviation (MAD), calculated as the 
median absolute deviation of the 200 estimates are reported in Tables 1 








PARTIALLY LINEAR SINGLE-INDEX MODELS 


17 


Table 2 

Performance of parameter estimation methods under correct specification of an 
underlying ARMA(1,1) correlation structure 



n 



30 



50 


m 

Parameters 

Methods 

Bias 

SD 

MAD 

Bias 

SD 

MAD 

10 

0i 

PULS 

SGEE 

-0.0029 

-0.0025 

0.0400 

0.0244 

0.0280 

0.0155 

0.0006 

0.0000 

0.0322 

0.0193 

0.0221 

0.0124 


02 

PULS 

SGEE 

0.0032 

0.0009 

0.0386 

0.0249 

0.0282 

0.0171 

-0.0045 

0.0001 

0.0299 

0.0212 

0.0205 

0.0126 


0i 

PULS 

SGEE 

-0.0004 

-0.0002 

0.0267 

0.0161 

0.0181 

0.0104 

-0.0003 

0.0006 

0.0188 

0.0146 

0.0126 

0.0073 


02 

PULS 

SGEE 

-0.0047 

-0.0031 

0.0343 

0.0192 

0.0209 

0.0113 

0.0005 

-0.0002 

0.0223 

0.0145 

0.0156 

0.0087 


03 

PULS 

SGEE 

0.0008 

0.0011 

0.0253 

0.0148 

0.0158 

0.0102 

-0.0009 

-0.0009 

0.0201 

0.0146 

0.0121 

0.0074 

30 

di 

PULS 

SGEE 

-0.0026 

0.0005 

0.0450 

0.0214 

0.0296 

0.0138 

-0.0016 

0.0015 

0.0374 

0.0288 

0.0273 

0.0105 


02 

PULS 

SGEE 

-0.0013 

0.0040 

0.0461 

0.0335 

0.0291 

0.0147 

0.0035 

0.0014 

0.0361 

0.0152 

0.0252 

0.0104 


01 

PULS 

SGEE 

-0.0014 

-0.0005 

0.0296 

0.0166 

0.0192 

0.0095 

-0.0010 

0.0006 

0.0207 

0.0092 

0.0159 

0.0063 


02 

PULS 

SGEE 

-0.0050 

-0.0037 

0.0355 

0.0371 

0.0231 

0.0120 

0.0011 

-0.0003 

0.0229 

0.0116 

0.0173 

0.0072 


03 

PULS 

SGEE 

0.0017 

0.0009 

0.0279 

0.0181 

0.0186 

0.0095 

-0.0006 

-0.0007 

0.0215 

0.0100 

0.0154 

0.0070 


and 2. Table 1 gives the results obtained under the correct specification of 
an underlying within-subject AR(1) correlation structure in ej(tjj), and Ta¬ 
ble 2 gives those obtained under the correct specification of an underlying 
ARMA(1,1) structure in ej(tij). For comparison, we also report the results 
from the PULS estimation. The results in Tables 1 and 2 show that the 
SGEE estimates are comparable with the PULS estimates in terms of bias 
and are more efficient than the PULS estimates, which supports the asymp¬ 
totic theory developed in Section 3. In Figures 1 and 2, we plot the local 
linear estimated link function from a typical realization together with the 
real curve for each combination of n and m. 

To investigate the robustness of the SGEE and PULS estimators to cor¬ 
relation structure misspecification, we also carry out a simulation study in 
which an AR(1) correlation structure is used in the covariance matrix esti¬ 
mation detailed in Section 4, when the true underlying correlation structure 
is ARMA(1,1). Table 3 reports the results under this misspecification. The 
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(c) (d) 

Fig. 1. Estimated link function (dot-dashed line), together with the true link function 
(solid line), from a typical realization of model (1-2) with AR(1) correlation structure for 
each combination ofn and m: (a) n = 30, m = 10; (b) n = 50, m = 10; (c) n = 30, m = 30; 
(d) n = 50, m = 30. 


table shows that in the presence of correlation structure misspecification, 
SGEE still produces more efficient parameter estimates than PULS. 

We also include a simulated example where the covariates in Z follow 
discrete distributions. The same model as above is used except that the 
covariates X ; T (t t j ) are drawn independently from a three-dimensional Gaus¬ 
sian distribution with mean 0 , variance 1 and pairwise correlation 0 . 1 , and 
Z J ( tij ) are independently drawn from a binomial distribution with success 
probability 0.5. The errors ei(tij ) are generated with the AR(1) serial corre¬ 
lation structure of ( 7 ,p) = (1,0.9). The simulation results for this example 
are presented in Table 4. The same finding as above can be obtained. Some 
additional results, that is, those on the average angles between the estimated 
and the true parameter vectors, are given in Appendix D of the supplemen¬ 
tary material [Chen et al. (2015)]. 

5.2. Real data analysis. We next illustrate the partially linear single¬ 
index model and the proposed SGEE estimation method through an empir¬ 
ical example which explores the relationship between lung function and air 
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Fig. 2. Estimated link function (dot-dashed line), together with the true link function 
(solid line), from a typical realization of model (1.2) with ARMA(1,1) correlation structure 
for each combination of n and rh: (a) n = 30, m = 10: (b) n = 50, to = 10: (c) n = 30, 
to = 30,’ (d) n = 50, m = 30. 


pollution. There is voluminous literature studying the effects of air pollu¬ 
tion on people’s health. For a review of the literature, the reader is referred 
to Pope, Bates and Raizenne (1995). Many studies have found association 
between air pollution and health problems such as increased respiratory 
symptoms, decreased lung function, increased hospitalizations or hospital 
visits for respiratory and cardiovascular diseases and increased respiratory 
morbidity [Dockery et al. (1989), Kinney et al. (1989), Pope (1991), Braun- 
Fahrlander et al. (1992), Lipfert and Hammerstrom (1992)]. While earlier 
research often used time series or cross-sectional data to evaluate the health 
effects of air pollution, recent advances in longitudinal data analysis tech¬ 
niques offer greater opportunities for studying this problem. In this paper, 
we will examine whether air pollution has a significant adverse effect on 
lung function, and, if so, to what extent. The use of the partially linear 
single-index model and the SGEE method would provide greater modeling 
flexibility than linear models and allow the within-subject correlation to be 
adequately taken into account. We will use a longitudinal data set obtained 
from a study where a total of 971 4th-grade children aged between 8 and 14 
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Table 3 

Performance of parameter estimation methods under misspecification of an underlying 
ARMA(1,1) correlation structure 



n 



30 



50 


m 

Parameters 

Methods 

Bias 

SD 

MAD 

Bias 

SD 

MAD 

10 

0i 

PULS 

SGEE 

0.0072 

-0.0054 

0.0410 

0.0261 

0.0357 

0.0210 

-0.0038 

-0.0055 

0.0299 

0.0211 

0.0201 

0.0147 


02 

PULS 

SGEE 

0.0068 

0.0025 

0.0336 

0.0267 

0.0256 

0.0157 

0.0037 

0.0023 

0.0290 

0.0190 

0.0163 

0.0136 


0i 

PULS 

SGEE 

0.0037 

0.0033 

0.0166 

0.0144 

0.0114 

0.0122 

0.0061 

0.0016 

0.0157 

0.0163 

0.0096 

0.0081 


02 

PULS 

SGEE 

-0.0092 

-0.0007 

0.0303 

0.0198 

0.0184 

0.0144 

-0.0084 

-0.0045 

0.0224 

0.0203 

0.0174 

0.0130 


03 

PULS 

SGEE 

-0.0005 

-0.0035 

0.0229 

0.0141 

0.0158 

0.0094 

-0.0028 

0.0000 

0.0160 

0.0134 

0.0111 

0.0092 

30 


PULS 

SGEE 

0.0066 

0.0093 

0.0403 

0.0144 

0.0259 

0.0087 

-0.0221 

0.0001 

0.0502 

0.0165 

0.0252 

0.0118 


02 

PULS 

SGEE 

-0.0138 

-0.0017 

0.0435 

0.0268 

0.0353 

0.0096 

0.0107 

0.0035 

0.0312 

0.0170 

0.0233 

0.0096 


01 

PULS 

SGEE 

0.0027 

0.0054 

0.0252 

0.0136 

0.0165 

0.0078 

0.0020 

0.0019 

0.0181 

0.0096 

0.0067 

0.0098 


02 

PULS 

SGEE 

-0.0063 

0.0009 

0.0265 

0.0198 

0.0245 

0.0118 

0.0021 

0.0046 

0.0315 

0.0136 

0.0273 

0.0094 


03 

PULS 

SGEE 

-0.0011 

-0.0065 

0.0285 

0.0178 

0.0258 

0.0137 

-0.0042 

-0.0046 

0.0217 

0.0120 

0.0136 

0.0084 


years (at their first visit to the hospital/clinic) were followed over 10 years. 
For each yearly visit of the children to the hospital/clinic, records on their 
forced expiratory volume (FEV), asthma symptom at visit (ASSPM, 1 for 
those with symptoms and 0 for those without), asthmatic status (ASS, 1 for 
asthma patient and 0 for nonasthma patient), gender (G, 1 for males and 
0 for females), race (R, 1 for nonwhites and 0 for whites), age (A), height 
(H), BMI and respiratory infection at visit (RINF, 1 for those with infection 
and 0 for those without) were taken. Together with the measurements from 
the children, the mean levels of ozone and NO 2 in the month prior to the 
visit were also recorded. Due to dropout or other reasons, the majority of 
children had 4 to 5 years of records, and the total number of observations 
in the data set is 3809. 

As in many other studies, the FEV will be used as a measure of lung 
function, and its log-transformed values, log(FEV), will be used as the re¬ 
sponse values in our model. Our main interest is to determine whether higher 
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Table 4 

Performance of parameter estimation methods under correct specification of an 
underlying AR(1) correlation structure when the covariates in Z are discrete 



n 



30 



50 


m 

Parameters 

Methods 

Bias 

SD 

MAD 

Bias 

SD 

MAD 

10 

0i 

PULS 

SGEE 

0.0215 

0.0228 

0.0530 

0.0511 

0.0404 

0.0208 

0.0018 

0.0037 

0.0646 

0.0298 

0.0472 

0.0138 


02 

PULS 

SGEE 

-0.0309 

0.0024 

0.0858 

0.0313 

0.0735 

0.0193 

0.0193 

0.0074 

0.0526 

0.0339 

0.0498 

0.0274 


0i 

PULS 

SGEE 

-0.0012 

-0.0060 

0.0185 

0.0157 

0.0090 

0.0082 

-0.0116 

0.0020 

0.0201 

0.0086 

0.0175 

0.0066 


02 

PULS 

SGEE 

-0.0020 

0.0122 

0.0263 

0.0241 

0.0232 

0.0143 

0.0138 

-0.0004 

0.0229 

0.0087 

0.0172 

0.0063 


03 

PULS 

SGEE 

0.0012 

-0.0008 

0.0206 

0.0078 

0.0075 

0.0048 

0.0036 

-0.0020 

0.0153 

0.0070 

0.0132 

0.0034 
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PULS 

SGEE 

0.0075 

0.0061 

0.0427 

0.0284 

0.0222 

0.0233 

0.0108 

0.0033 

0.0723 

0.0226 

0.0513 

0.0175 


02 

PULS 

SGEE 

-0.0143 

0.0116 

0.0768 

0.0275 

0.0401 

0.0125 

0.0023 

-0.0039 

0.0681 

0.0259 

0.0417 

0.0196 


01 

PULS 

SGEE 

-0.0159 

-0.0030 

0.0310 

0.0083 

0.0252 

0.0045 

0.0031 

0.0015 

0.0218 

0.0098 

0.0168 

0.0064 


02 

PULS 

SGEE 

-0.0026 

0.0040 

0.0192 

0.0200 

0.0112 

0.0133 

0.0048 

0.0002 

0.0252 

0.0115 

0.0200 

0.0084 


03 

PULS 

SGEE 

0.0151 

0.0006 

0.0331 

0.0133 

0.0308 

0.0083 

-0.0067 

-0.0018 

0.0228 

0.0103 

0.0150 

0.0064 


levels of ozone and NO 2 would lead to decrements in lung function. To 
account for the effects of other confounding factors, we include all other 
recorded variables. As age and height exhibit strong co-linearity (with a 
correlation of 0.78), we will only use height in the study. In fitting the par¬ 
tially linear single-index model to the data, all the continuous variables (i.e., 
FEV, H, BMI, OZONE and NO 2 ) are log-transformed, and the log(BMI), 
log(OZONE) and log(N02) are included in the single-index part. The log(H) 
and all the binary variables are included in the linear part of the model. 

The scatter plots of the response variable against the continuous regres¬ 
sors are shown in Figure 3, and the box plots of the response against the 
binary regressors are given in Figure 4. We use an ARMA(1,1) within- 
subject correlation structure in the estimation of the covariance matrix for 
the proposed SGEE method. The resulting estimated model is as follows: 

log(FEV) 

« 0.0325 * G - 0.0111 * ASS - 0.0671 * R 
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3.5 4 

4.5 

1 1.5 

2 2.5 

log(OZONE) 



log(N02) 


Fig. 3. The scatter plots of the response variable log(FEV) against the continuous re¬ 
gressors, that is, (clockwise from top left) log(H), log(BMI), log(N02), log(OZONE). 


(0.0041) (0.0080) (0.0059) 

- 0.0047 * ASSPM - 0.0068 * RINF + 2.3206 * log(H), 

(0.0085) (0.0043) (0.0307) 

+ rj[0.9929 * log(BMI) - 0.0924 * log(OZONE) - 0.0753 * log(N0 2 )] 
(0.0560) (0.0127) (0.0125), 

where the numbers in the parentheses under the estimated coefficien’s are 
their respective estimated standard errors. The estimated link function and 
its 95% point-wise confidence intervals are plotted in Figure 5. 

From Figure 5, it can be seen that the estimated link function is overall 
increasing. The 95% point-wise confidence intervals show that a linear func¬ 
tional form for the unknown link function would be rejected, and thus the 
partially liner single-index model might be more appropriate than the tra¬ 
ditional linear regression model. Meanwhile, it can be seen from the above 
estimated model that height and BMI are significant positive factors in ac¬ 
counting for lung function. Taller children and children with larger BMI tend 
to have higher FEV. Furthermore, male and white children have, on average, 
higher FEV than female or nonwhite children. Furthermore, both OZONE 
and NO 2 in the single-index component have negative effects on children’s 
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Fig. 4. The box plots of the response variable log(FEV) against the binary regressors, 
that is, (clockwise from top left) G, ASS, R, RINF, ASSPM. 


lung function, as the estimated coefficients for OZONE and NO 2 are nega¬ 
tive, and the estimated link function is increasing. Although these negative 
effects are relatively small in magnitude compared to the effect of BMI, they 
are statistically significant. This means that higher levels of ozone and NO 2 
tend to lead to reduced lung function as represented by lower values of FEV. 

6. Conclusions and discussions. In this paper, we study a partially linear 
single-index modeling structure for possibly unbalanced longitudinal data in 



lx 


Fig. 5 


The estimated link function and its 95% point-wise confidence intervals. 
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a general framework, which includes both the sparse and dense longitudinal 
data cases. An SGEE method with the first-stage local linear smoothing is 
introduced to estimate the two parameter vectors as well as the unspecified 
link function. 

In Theorems 1 and 2, we derive the asymptotic properties of the pro¬ 
posed parametric and nonparametric estimators in different scenarios, from 
which we find that the convergence rates and asymptotic variances of the 
resulting estimators in the sparse longitudinal data case could be substan¬ 
tially different from those in the dense longitudinal data. In Section 4, we 
propose a semiparametric method to estimate the error covariance matrices 
which are involved in the estimation equations. The conditional variance 
function is estimated by using the log-transformed local linear method, and 
the parameters in the correlation matrices are estimated by the minimum 
generalized variance method. In particular, if the correlation matrices are 
correctly specified, as is stated in Corollary 1, the SGEE-based estimators 
/3 and 6 are generally asymptotically more efficient than the correspond¬ 
ing PULS estimators (3 and 6 in the sense that the asymptotic covariance 
matrix of the SGEE estimators minus that of the PULS estimators is nega¬ 
tive semi-definite. Both the simulation study and empirical data analysis in 
Section 5 show that the proposed methods work well in the finite samples. 

Recently, Yao and Li (2013) developed a new nonparametric regression 
function estimation method for a longitudinal regression model. This method 
takes into account the within-subject correlation information and thus gen¬ 
erally improves the asymptotic estimation efficiency. It would also be inter¬ 
esting to incorporate the within-subject correlation information in the local 
linear estimation of the unknown link function in this paper and to examine 
both theoretical and empirical performance of the resulting estimator. We 
will leave this issue for future research. Another possible future topic is to 
extend the semiparametric techniques of variable selection and specification 
testing proposed by Liang et al. (2010) from the i.i.d. case to the general 
longitudinal data case discussed in the present paper. 

APPENDIX A: REGULARITY CONDITIONS 

To establish the asymptotic properties of the SGEE estimators proposed 
in Section 2, we introduce the following regularity conditions, although some 
of them might not be the weakest possible. 

Assumption 1. The kernel function K (-) is a bounded and symmetric 
probability density function with compact support. Furthermore, the kernel 
function has a continuous first-order derivative function denoted by A'(-). 
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Assumption 2. (i) The errors e. t j = ei(iy), 1 < i < n , 1 < j < rrii , are 

independent across i; that is, e* defined in Section 2, 1 < i < n, are mutually 
independent. 

(ii) The covariates X,j and Zjj, 1 < i < n, 1 <y< m,, are i.i.d. random 
vectors. 

(iii) The errors e^- are independent of the covariates Z,j and X,j, and 
for each i, eij , 1 <j< rrii, may be correlated with each other. Furthermore, 
E[ etj} = 0, 0 < E [efj] < oo and E[|ej ? j 2+ ^] < oo for some 5 > 0. The largest 
eigenvalues of W,; and WjE[eje,;] W, are bounded for any i. 

Assumption 3. (i) The density function /g(-) of X^# is positive and 

has a continuous second-order derivative in U = {x T 0 : x G A, 6 G 0}, where 
0 is a compact parameter space for Q and A is a compact support of X,j. 

(ii) The function pz{u\6) = E[Zjj |X)f 6 = u] has a bounded and continuous 
second-order derivative (with respect to u ) for any 0G0, and E[||Zjj|| 2+5 ] < 
oo, where 5 was defined in Assumption 2 (iii). 


Assumption 4. The link function r/(-) has continuous derivatives up to 
the second order. 


Assumption 5. 
(A.l) uj n h 6 — >0, 


The bandwidth h satisfies 

n 2 h? rji2/(2+S) ^g n 

- —- —y oo - 

N n (h) logn ’ h 2 N n {h) 


o(l), 


where N n (h ) = ^T"=i 1 /(rriih), T n = Y^i=i m i an d <5 was defined in Assump¬ 
tion 2 (iii)- Furthermore, maxi +mfh~ 1 ) = o(w n ). 


Remark 5. Assumption 1 imposes some mild restrictions on the kernel 
functions, which have been used in the existing literature in i.i.d. and weakly 
dependent time series cases; see, for example, Fan and Gijbels (1996) and 
Gao (2007). The compact support restriction on the kernel functions can be 
removed if we impose certain restrictions on the tail of the kernel function. 
In Assumption 2(i), the longitudinal data under investigation is assumed to 
be independent across subjects i, which is not uncommon in longitudinal 
data analysis; see, for example, Wu and Zhang (2006) and Zhang, Fan and 
Sun (2009). Assumption 2(ii) is imposed to simplify the presentation of 
the asymptotic results. However, we may replace Assumption 2(ii) with the 
conditions that the covariates X ?J and Z - L j are i.i.d. across i and identically 
distributed across j, and in the case of dense longitudinal data, it is further 
satisfied that for re = 0,1,2,..., 


(A.2) Var 


m . 


mi TT 

E U ij 
h 

3 =1 


Xj0-u 


K 


Xf 0 - u 


< C[rriih) 


-i 
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uniformly for u^U and 0 € 0, where Uij can be 1, ZjjRi(Zjj), or X^T^X,;.,-), 
B\(-) and L^O) are two bounded functions, and C is a positive constant 
which is independent of i. When X 7J and Z ij are stationary and a-mixing 
dependent across j for the case of dense longitudinal data, it is easy to vali¬ 
date the high-level condition (A. 2). In Assumption 2(iii), we allow the error 
terms to have certain within-subject correlation, which makes the model 
assumptions more realistic. Assumption 3 gives some commonly-used condi¬ 
tions in partially linear single-index models; see Xia and Hardle (2006) and 
Chen, Gao and Li (2013b), for example. Assumption 4 is a mild smoothness 
condition on the link function imposed for the application of the local linear 
fitting. Assumption 5 gives a set of restrictions on the bandwidth h, which 
is involved in the estimation of the link function. Note that the bandwidth 
conditions in Assumption 5 imply that the milder bandwidth conditions in 
(C.l) of Lemma 1 in the supplemental material [Chen et al. (2015)] are 
satisfied. Hence we can use Lemma 1 to prove our main theoretical results. 

We next give some regularity conditions, which are needed to derive the 
asymptotic property of the nonparametric conditional variance estimators 
in Section 4. 

Assumption 6. The kernel function K i(-) is a continuous and symmet¬ 
ric probability density function with compact support. 

Assumption 7. The observation times, tij, are i.i.d. and have a contin¬ 
uous and positive probability density function which has a compact 

support T. The density function of £ 2 {tij) is continuous and bounded. Let 
5 > 2, which strengthens the moment conditions in Assumptions 2 and 3. 

Assumption 8. The conditional variance function <t 2 (-) has a continu¬ 
ous second-order derivative and satisfies inf te 7 -cr 2 (t) > 0. Let <r 2 (-) and cr 2 (-) 
be its first-order and second-order derivative functions, respectively. 

Assumption 9. The bandwidth h\ satisfies 



(A.3) 


where N n (hi) = ^=1 l/(mhi). 


Remark 6. Assumption 7 imposes a mild condition on the observation 
times [see, e.g., Jiang and Wang (2011)] and strengthens the moment condi¬ 
tions on eij and Z ?J . However, such moment conditions are not uncommon 
in the asymptotic theory for nonparametric conditional variance estimation 
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[Chen, Cheng and Peng (2009)]. Since the local linear smoothing technique 
is applied, a certain smoothness condition has to be assumed on ct 2 (-), as 
is done in Assumption 8. Assumption 9 gives some mild restrictions on the 
bandwidth hi, which is used in the estimation of the conditional variance 
function. 


APPENDIX B: PROOFS OF THE MAIN RESULTS 

In this appendix, we provide the detailed proofs of the main results given 
in Section 3. 


B.l. Proof of Theorem 1. By the definition of the weighted local linear 
estimators in (2.4) and (2.5), we have 

n 

rj(u\(3,0) - rj(u) = ^ Sj(u|0)(Y; - Z*/3) - rj(u) 

i— 1 

n n 

= ^s i (u|0)e i + ^s i (u|0)Z i (/3 o - (3) 

i= 1 i=l 

n 

(B.l) + £ Si(u |0)[T7(Xi, 0 O ) - »?(Xi, 0)} 

i=1 
n 

+ J2 Si (u\e)v(^,0)-rj(u) 

2—1 

= Ini + In2 + In3 + Ini- 


For I n i, note that by a first-order Taylor expansion of K(-), we have, for 
i = 1,..., n and j = 1,..., m;, 


I< 


XjjO-u 

h 


= K 


xle n -u 


+ K 


-u\ Xl(e - 6 


h 


where K(-) is the first-order derivative of K(-) and 6* = 6o + A*(0 — 6 o), 
0 < A* < 1. Hence, by some standard calculations and the assumption that 
n 2 h 2 /{N n (h) logn} —> oo, we have 


n n 

Inl = ^Si(u|0 O )ei + ^[Sj(u|0) - Sj(u|<?o)] e i 
2—1 2 — 1 


Tl 

= ^2si(u\6 0 )ei + O P 
2 — 1 


\e~e 


ol 


y/N n (h) logn 


(B.2) 


nh 
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= Y Si{u\0 0 )ei + o P (\\e - 0 O ||) 

i =1 

for any u £ U and 0 € 0. 

By Lemma 2 in the supplementary material [Chen et al. (2015)], we can 
prove that 

(B.3) I n2 = -pl(u)(/3 - (3 0 ) + Op(\\(3 - (3 0 \\ 2 + ||6 0 O || 2 ) 

for any u £ U, where pz{u) = pz(u\6o) = E[Zjj|X^0o = w]. 

Note that 

vVQjO) - vPQjOo) = t )(xJ j e 0 )xJ j (G - e 0 ) + o P (\\o - 0 O || 2 ), 

which, together with Lemma 3 in the supplementary material [Chen et al. 
(2015)], leads to 

(B.4) I n3 = -fi(u)pl(u)(0 - 0 O ) + O P (||0 - <9 0 || 2 ) 

for any where px{u) = Px(u |0o) = E[Xjj|X^0o = it]- 

By a second-order Taylor expansion of r/(-) and the hrst-order Taylor 
expansion of K(-) used to handle I n \, we can prove that, for any u^U, we 
have 

(B.5) I n 4 = \p 2 V{u)h 2 [l + 0 P (h)} +op(||0 - 0 O ||). 

Recall that /3 and Q\ are the solutions to the equations in (2.8). By (B.l)- 
(B.5), we can prove that, uniformly for i = 1,..., n and j = 1,..., mi, 

vVQjdifadj-riixTjO o) 

= rjpQjd 1 |3,0i) - ^(Xj.0013,0!) + t?(X^0 o | 3, 0]_) - J7(Xj.0 o ) 

= ^(Xj.0o|3,0i)x5(0i - 0o) + » 7 (X 50 o| 3 , 0 i) - r/(xj0 o ) 

(B.6) + Op(||0i - 0 O || 2 ) 

= 7?(X,^0 o )[Xjj - px(Xj0o)] T (0i - 0o)(l + op( 1)) 

n 

+ Y s fc (Xj0 o )e fc - pi (X^0o)(3 - /3 0 )(1 + °p(1)) 

k =1 

+ i/i 2 i)(Xj0 o )h 2 + Op(h 3 ) + Op{\\e l - 0 O || 2 +1|3 - /3 0 || 2 ), 

where s lfc (XT.0 o ) = s k (X.J j 9 0 \9 0 ). 

By the definitions of (3 and 6\ [see (2.8) in Section 2], we have 

n 

Y Aj(0i)Wi[Yi - Z,3 - r)(X. t |3,0i)] = 0. 


(B.7) 
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By the uniform consistency results for the local linear estimators (such as 
Lemmas 2 and 3 in the supplementary material [Chen et al. (2015)]), we can 
approximate A,,(<?i) in (B.7) by A z = A,;(0o) when deriving the asymptotic 
distribution theory. Then we have 

n 

0 = £ A^OWifYi - Z.3 - ^(X* 13,00] 

i=1 
n 

(B.8) = ]T ATWifYi - Z,;3 - f7(Xi|3,0O] 

i=1 

n 

+ E( A i(0O - Aj) T Wj[Yj - Z,;3 - ^(Xi|3,00] 

%— 1 
n 

~ E A * Tw *[ Y * - z *3 - ^(x,[3,00] [i+ o P { ||01 - 0 O id] , 

i=i 


where and below a n ~ b n denotes a n = 6 n (l + op(l)). Furthermore, note that 

Y, : - Zj - f7(X03,^i) = ei - Zi(3 - Po) - [r)(X03,^O - »*(Xi,0 o )], 

which, together with (B.6) and the bandwidth condition u} n h b = o(l), implies 
that 

n 

E A < W 't Y ' - ZiP-v(Xi\P,0i)] 

1=1 

n n 

= E A7W,e, -E A 7W. t Z,(3 - 0o) 

i=l i=l 

n 

- E a 7W,[77(X, |3, 00 - r?(X*, 0o)] 

Z=1 


(B.9) 


= ~E A 7wOZi - Pz(X i ,0 o )](3 -/3 0 )(1 + op(l)) 

i=l 


E ATWj{ [r,(Xi , 0 O ) 0 1J] © [Xj - p x (Xi, 0 O )]} 


i=l 


X (01 — 0o)(l + op(l)) 


+ E a 7 w * 


i=1 


^ ^ ^fc(Xi) 0o)efc 


fc=l 


+ Op (||/3 — /3 0 1| 2 + ||0i — 0o|| 2 ), 
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where s fc (Xj,0 o ) = [sJ(X^0 o ),..., &J(Xj m .Oo)] T , p z (Xj,0 o ) and p x (Xj,0 o ) 
were defined in Section 2. Following the standard proof in the existing lit¬ 
erature [see, e.g., Ichimura (1993), Chen, Gao and Li (2013b)], we can show 
the weak consistency of (3 and 0 Note that 


E A ^ W * A * 

2=1 


f P~Po\ 

\di-e 0 ) 


= E A j W i {[i?(X i , 0 O ) 0 lj] 0 [Xj - p x (X,;, 0 o )]}(0i - G 0 ) 
2=1 


+ J2AjW i [Z i -p z (X i ,0 o )]0-/3 o ) 


2=1 


and 


XNw, 


2=1 


^ ' Sfc(Xj, 0o)®fc 


Lfc=i 


= op(^ /2 ), 


which, together with (B.8) and (B.9), lead to 

~ Po \ p 


(B.10) 


1>/ W,A; 

_ 2=1 


G i — 0n 


E A * Tw * 

2=1 


Define I(0 O , B 0 ) = diag{I d , M}, O(0 o ) = (°o/ xd °eo') ’ where M = ( 0 o, B o) 
was defined in Section 3. It is easy to find that 

(B.ll) I d+P = I(0o,B o )I T (0cnB 0 ) = O(0 o )O T (0 o ) + I(B 0 )I T (B 0 ). 

By the identification condition on 0o, we may show that 

0i 0 O 0i 0o 0o 0o 


0-0 o = 


|0-i 


l|0o|| ||0i 


P 01 — 0Q _ 0 o 0T 0! “ 00 _ u aa X 


l|0o|| 


l|0oll 


||0i|| ' ||0i|| l|0o|| 

= (Ip — 0 O 0Q )(0! — 0 O ), 


which implies that G — Gq = BqB[( (0i — 0q) and 


(B.12) 


(3-Po 

0-0o. 


= I(B 0 )I t (B 0 ) 


/3 — /3 0 

0i — 0o 


By (B.10), (B.ll) and using the fact that AjO(0o) = 0, we have 


I T (B 0 ) 


E A '-' W, A, 


_ 2=1 


I(B 0 )I T (B 0 ) 


I:* ~ iT(b °> 


XNw.e, 


2=1 
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which, together with (B.12), implies that 


(p- p Q \ 

\d-G 0 ) 


~I(B 0 ) 


I T (Bo) 


E- 

_i =1 


A W A, 


I(B 0 ) 


I T (B 0 ) 


E A * Tw * e * 

_ i= 1 


Thus, by (3.1)-(3.3), the definition of the Moore-Penrose inverse and the 
classical central limit theorem for independent sequence, we can show that 
(3.4) in Theorem 1 holds. 


B.2. Proof of Corollary 1. By Theorem 1, the PULS estimators (3 and 
6 have the following asymptotic normal distribution: 

(B.13) a;*/ 2 -^ N (°> n o*ni*Oo*). 

where L2o* and fix* are two matrices such that 

n i n 

— Y^AjA,An ( H, — y'E[ATv i A i ]^.n 1 *, 

^n ■ , Ld n . 

1=1 1=1 

and Vj is the conditional covariance matrix of ej. 

On the other hand, when the weights W,, i = 1,... ,n, are chosen as the 
inverse of Vj, by Theorem 1, we have 

(B.14) (d-el) 

where is a positive semi-definite matrix such that 

n 

— ^[ATvr 1 ^]-^. 

In order to prove Corollary 1, by (B.13) and (B.14), we need only to 
show f2 1 * — f2+ is positive semi-definite. Letting ©j = fi^A/V,^ 2 — 

fi^AjVj 1/<2 , we have 

©©■ = (fi 0 + *A t v - /2 - n+AjVr 1 / 2 )(«+AjV ; /2 - n+AjVr 1/2 ) T 

= n+A i v i A i n+ - n+AjAj n+ - n+A 4 A, t o+ + n+A i v i - 1 A i n+ 

which indicates that 

1 n 

(B.15) — VE[©j©7] n+ n u n+ - n+ 

Wn r - : 

1=1 

As E[ 0 j© 7 ] is positive semi-definite, by (B.15) we know that — 

fi+ is also positive semi-definite. Hence the proof of Corollary 1 is complete. 
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B.3. Proof of Theorem 2. Note that 


V(u) -y(u) = ^rsi(u\d){Yi - Zjp)-r]{u) 


i=l 


^ ~2si(u\d)ei + 


(B.16) 


i=l 


^Sj(«|0)j7(Xi,0 o ) - il(ui 


i=l 


+ ^ Si ( M |0)z.7(/3 o -3) 


i =1 


— Ini,* "b In2,* "L In3 
By Assumption 1, we have 
'Xj : e-u 


(B.17) K 




h 


= K 


X^An - u 




h 


+ K 


X'0 o -n\X',(0-0 o ) 


h 


h 


where 6^ = 6 o + \§(0 — 6 q) for some 0 < Aq < 1. By Theorem 1, we have 

(B.18) ||0 - <9 0 || + ||3 - ;3 0 || = O jP (o.- 1 / 2 ) . 

It follows from (B.17), (B.18) and (3.5) that 

n n 

In 3,* = ^Si(u|6>o)z7 (0 O - 3) + J^[Si(tt|fl) - Si(u|6>o)]z7 (00 - 3) 

i= 1 2=1 

(B.19) =Op(u;- 1/2 ) + Op(uj- 1 ) 

= °p( ( fn 1/2 ( h ))- 

Similar to the proof of (B.5), we can show that 

(B.20) I n2 ,* = \i)(u)^ 2 h 2 (l + op(l)). 

For I n i,* j note that by (B.17) and (B.18), we can show that )P” = i Sj(u|#o) e i 
is the leading term of I n \ t *. Letting Zi(Oo) = Sj(u|#o) e i and by Assumption 2, 
it is easy to check that {zi(0 q) : i > 1} is a sequence of independent random 
variables. By Assumption 2(iii), we have E[zj(#o)] = 0. By (3.5), (3.6) and 
the central limit theorem, it can be readily seen that 

(B.21) ^ 2 (h)/ n i,*4N(0,a* 2 ). 

In view of (B.16), (B.19)-(B.21), the proof of Theorem 2 is complete. 
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SUPPLEMENTARY MATERIAL 

Supplement to “Semiparametric GEE analysis in partially linear single¬ 
index models for longitudinal data” (DOI: 10.1214/15-AOS1320SUPP; .pdf). 
The supplement gives the proof of Theorem 3 and some technical lemmas 
that were used to prove the main results in Appendix B. It also includes 
some additional results of our simulation studies described in Section 5. 
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